Generalized Gamma distribution (gengamma)#
The generalized gamma distribution (scipy.stats.gengamma) is a flexible continuous distribution on \((0,\infty)\).
It is widely used for modeling positive data (lifetimes, waiting times, sizes) because it can interpolate between several classic families.
A key identity makes it easy to work with:
Equivalently, in the standard form (loc=0, scale=1) we have \(X^c \sim \mathrm{Gamma}(a,1)\).
What you’ll learn#
the PDF/CDF and the role of the incomplete gamma function
moment conditions and closed-form raw moments
how parameters \(a\), \(c\), and
scalechange the shapea NumPy-only sampler (via Gamma sampling) + Monte Carlo validation
practical usage via
scipy.stats.gengamma(pdf,cdf,rvs,fit)
import numpy as np
import plotly
import plotly.graph_objects as go
import os
import plotly.io as pio
import scipy
from scipy import special, stats
from scipy.stats import gengamma as gengamma_dist
# Plotly rendering (CKC convention)
pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
# Reproducibility
rng = np.random.default_rng(42)
np.set_printoptions(precision=5, suppress=True)
# Record versions for reproducibility (useful when numerical details matter).
VERSIONS = {"numpy": np.__version__, "scipy": scipy.__version__, "plotly": plotly.__version__}
1) Title & Classification#
Name:
gengamma(generalized gamma; SciPy:scipy.stats.gengamma)Type: Continuous
Support (standard form): \(x \in (0,\infty)\)
Parameter space (standard form): shape parameters \(a>0\), \(c\ne 0\)
SciPy location/scale:
loc \in \mathbb{R},scale > 0with $\(X = \mathrm{loc} + \mathrm{scale}\,Y, \qquad Y \sim \mathrm{GenGamma}(a,c).\)\( Support becomes \)x>\mathrm{loc}$.
Unless stated otherwise, we work with the standard form (loc=0, scale=1).
2) Intuition & Motivation#
What it models#
The generalized gamma is a shape-flexible model for positive data. Depending on parameters it can resemble a Gamma/Weibull-like right-skewed distribution, and for some negative values of \(c\) it can produce very heavy tails.
A very useful lens is a power transform: after adjusting for scale, the random variable \(X^c\) is Gamma-distributed.
That means many properties can be derived by reducing the problem to the Gamma distribution.
Typical real-world use cases#
Reliability / survival analysis: time-to-failure, repair times, waiting times.
Hydrology / climate: rainfall amounts and other strictly-positive quantities.
Economics / operations: positive durations and sizes with flexible skew/tails.
Generative modeling: a flexible positive noise model or a component in mixture models.
Relations to other distributions (special cases)#
In the standard form (loc=0, scale=1) for \(c>0\):
\(c = 1\) gives the Gamma distribution: \(X \sim \mathrm{Gamma}(a,1)\).
\(a = 1\) gives the Weibull distribution: \(X \sim \mathrm{Weibull}(k=c,\lambda=1)\).
\(a = 1,\; c = 1\) gives the Exponential distribution (rate 1).
\(c = 2\) with \(a = \nu/2\) and
scale = \sqrt{2}yields the Chi distribution with \(\nu\) degrees of freedom.
More generally, \(Y=(X/\mathrm{scale})^c \sim \mathrm{Gamma}(a,1)\), which is the workhorse relationship we will exploit.
3) Formal Definition#
PDF (standard form)#
For \(x>0\), \(a>0\), \(c\ne 0\), the generalized gamma PDF is
PDF with loc/scale#
SciPy uses the location/scale transform \(x = \mathrm{loc} + \mathrm{scale}\,y\) with scale>0.
Writing \(y=(x-\mathrm{loc})/\mathrm{scale}\),
CDF#
Let \(u = \left(\frac{x-\mathrm{loc}}{\mathrm{scale}}\right)^c\) for \(x>\mathrm{loc}\), and let
\(\gamma(a,u) = \int_0^u t^{a-1}e^{-t}\,dt\) be the lower incomplete gamma,
\(\Gamma(a,u) = \int_u^\infty t^{a-1}e^{-t}\,dt\) be the upper incomplete gamma,
\(P(a,u)=\gamma(a,u)/\Gamma(a)\) and \(Q(a,u)=\Gamma(a,u)/\Gamma(a)=1-P(a,u)\) be the regularized versions.
Then
The sign dependence comes from the fact that \(x \mapsto x^c\) is increasing for \(c>0\) and decreasing for \(c<0\).
def _validate_params(a: float, c: float, scale: float) -> None:
if not (a > 0):
raise ValueError("Parameter 'a' must be > 0")
if c == 0:
raise ValueError("Parameter 'c' must be non-zero")
if not (scale > 0):
raise ValueError("Parameter 'scale' must be > 0")
def gengamma_logpdf(x: np.ndarray, a: float, c: float, loc: float = 0.0, scale: float = 1.0) -> np.ndarray:
"""Log-PDF of SciPy's generalized gamma parameterization.
Standard form (loc=0, scale=1):
f(x; a,c) = |c| x^{ca-1} exp(-x^c) / Gamma(a), x>0
Notes:
- Works for any real c != 0.
- Uses log space for numerical stability.
"""
a = float(a)
c = float(c)
loc = float(loc)
scale = float(scale)
_validate_params(a, c, scale)
x = np.asarray(x, dtype=float)
y = (x - loc) / scale
out = np.full_like(y, -np.inf, dtype=float)
mask = y > 0
if not np.any(mask):
return out
ym = y[mask]
with np.errstate(divide="ignore", invalid="ignore", over="ignore"):
out[mask] = (
np.log(abs(c))
- np.log(scale)
- special.gammaln(a)
+ (c * a - 1.0) * np.log(ym)
- np.power(ym, c)
)
return out
def gengamma_pdf(x: np.ndarray, a: float, c: float, loc: float = 0.0, scale: float = 1.0) -> np.ndarray:
return np.exp(gengamma_logpdf(x, a, c, loc=loc, scale=scale))
def gengamma_cdf(x: np.ndarray, a: float, c: float, loc: float = 0.0, scale: float = 1.0) -> np.ndarray:
"""CDF expressed via regularized incomplete gamma functions.
For u = ((x-loc)/scale)^c:
- if c>0: F = P(a,u) = gammainc(a,u)
- if c<0: F = Q(a,u) = gammaincc(a,u)
"""
a = float(a)
c = float(c)
loc = float(loc)
scale = float(scale)
_validate_params(a, c, scale)
x = np.asarray(x, dtype=float)
y = (x - loc) / scale
out = np.zeros_like(y, dtype=float)
mask = y > 0
if not np.any(mask):
return out
ym = y[mask]
u = np.power(ym, c)
if c > 0:
out[mask] = special.gammainc(a, u)
else:
out[mask] = special.gammaincc(a, u)
# Defensive clipping for tiny numerical drift.
return np.clip(out, 0.0, 1.0)
def gengamma_raw_moment(k: float, a: float, c: float, scale: float = 1.0) -> float:
"""Raw moment E[X^k] for loc=0.
Exists when a + k/c > 0.
"""
a = float(a)
c = float(c)
scale = float(scale)
_validate_params(a, c, scale)
t = a + k / c
if t <= 0:
return float("nan")
return float(np.exp(special.gammaln(t) - special.gammaln(a) + k * np.log(scale)))
def gengamma_mvsk(a: float, c: float, scale: float = 1.0) -> tuple[float, float, float, float]:
"""Mean/variance/skew/(excess)kurtosis for loc=0, if they exist."""
m1 = gengamma_raw_moment(1, a, c, scale)
m2 = gengamma_raw_moment(2, a, c, scale)
m3 = gengamma_raw_moment(3, a, c, scale)
m4 = gengamma_raw_moment(4, a, c, scale)
mean = m1
var = m2 - m1**2
if not np.isfinite(var) or var <= 0:
return mean, var, float("nan"), float("nan")
mu3 = m3 - 3 * m2 * mean + 2 * mean**3
mu4 = m4 - 4 * m3 * mean + 6 * m2 * mean**2 - 3 * mean**4
skew = mu3 / var ** 1.5
kurt_excess = mu4 / var**2 - 3.0
return mean, var, skew, kurt_excess
def gengamma_entropy(a: float, c: float, scale: float = 1.0) -> float:
"""Differential entropy for loc=0, scale=scale."""
a = float(a)
c = float(c)
scale = float(scale)
_validate_params(a, c, scale)
return float(
np.log(scale)
- np.log(abs(c))
+ special.gammaln(a)
+ a
- (a - 1.0 / c) * special.digamma(a)
)
def gengamma_rvs_numpy(
a: float,
c: float,
loc: float = 0.0,
scale: float = 1.0,
size: int | tuple[int, ...] = 1,
rng: np.random.Generator | None = None,
) -> np.ndarray:
"""NumPy-only sampler using the Gamma transform.
Algorithm:
1) draw Y ~ Gamma(a, 1) using NumPy
2) return X = loc + scale * Y^(1/c)
"""
a = float(a)
c = float(c)
loc = float(loc)
scale = float(scale)
_validate_params(a, c, scale)
rng = np.random.default_rng() if rng is None else rng
y = rng.gamma(shape=a, scale=1.0, size=size)
return loc + scale * np.power(y, 1.0 / c)
# Sanity check: our formulas match SciPy.
a, c, loc, scale = 2.3, 1.7, 0.0, 0.8
x = np.logspace(-3, 2, 9)
pdf_ours = gengamma_pdf(x, a, c, loc=loc, scale=scale)
cdf_ours = gengamma_cdf(x, a, c, loc=loc, scale=scale)
pdf_scipy = gengamma_dist.pdf(x, a, c, loc=loc, scale=scale)
cdf_scipy = gengamma_dist.cdf(x, a, c, loc=loc, scale=scale)
print("max |pdf diff|:", float(np.max(np.abs(pdf_ours - pdf_scipy))))
print("max |cdf diff|:", float(np.max(np.abs(cdf_ours - cdf_scipy))))
print("allclose?", np.allclose(pdf_ours, pdf_scipy) and np.allclose(cdf_ours, cdf_scipy))
max |pdf diff|: 1.1102230246251565e-16
max |cdf diff|: 0.0
allclose? True
4) Moments & Properties#
Raw moments#
For loc=0 and scale=\beta, if \(a + k/c > 0\),
This follows immediately from \(Y=(X/\beta)^c \sim \mathrm{Gamma}(a,1)\).
Mean / variance#
When they exist (i.e. \(a + 1/c>0\) and \(a + 2/c>0\)):
Skewness and (excess) kurtosis can be expressed using \(m_k=\mathbb{E}[X^k]\) for \(k\le 4\).
Existence of moments#
Because \(m_k\) requires \(a + k/c>0\), moment existence depends on the sign of \(c\):
for \(c>0\), all positive raw moments exist (since \(a>0\) and \(k/c>0\))
for \(c<0\), sufficiently high moments may diverge (heavy tail)
MGF / characteristic function#
There is no simple closed form for \(M_X(t)=\mathbb{E}[e^{tX}]\) in general. A useful representation is the moment series
when the series converges.
Rule of thumb for \(t>0\) (tail comparison):
\(c>1\): MGF exists for all real \(t\)
\(c=1\): Gamma case; MGF exists for \(t < 1/\beta\)
\(0<c<1\) or \(c<0\): MGF diverges for any \(t>0\) (tails are too heavy)
The characteristic function \(\varphi_X(t)=\mathbb{E}[e^{itX}]\) always exists.
Entropy#
For loc=0 and scale=\beta the differential entropy is
where \(\psi\) is the digamma function.
# Compare closed-form moments/entropy against SciPy.
a, c, scale = 4.42, -3.12, 2.0
mean, var, skew, kurt_excess = gengamma_mvsk(a, c, scale=scale)
mean_s, var_s, skew_s, kurt_excess_s = gengamma_dist.stats(a, c, scale=scale, moments="mvsk")
entropy = gengamma_entropy(a, c, scale=scale)
entropy_s = gengamma_dist.entropy(a, c, scale=scale)
print("mean:", mean, "(SciPy:", float(mean_s), ")")
print("var:", var, "(SciPy:", float(var_s), ")")
print("skew:", skew, "(SciPy:", float(skew_s), ")")
print("excess kurtosis:", kurt_excess, "(SciPy:", float(kurt_excess_s), ")")
print("entropy:", entropy, "(SciPy:", float(entropy_s), ")")
mean: 1.307135070065489 (SciPy: 1.3071350700654891 )
var: 0.04921521947520513 (SciPy: 0.049215219475204686 )
skew: 1.1262322238186198 (SciPy: 1.1262322238187161 )
excess kurtosis: 2.6645270107947754 (SciPy: 2.6645270107930443 )
entropy: -0.1699451239020231 (SciPy: -0.16994512390202365 )
# Empirical characteristic function via Monte Carlo (always exists).
a0, c0, scale0 = 2.0, 0.7, 1.0
samples = gengamma_rvs_numpy(a0, c0, scale=scale0, size=80_000, rng=rng)
t_grid = np.linspace(0, 12, 250)
phi_hat = np.array([np.mean(np.exp(1j * t * samples)) for t in t_grid])
fig = go.Figure()
fig.add_trace(go.Scatter(x=t_grid, y=np.real(phi_hat), mode="lines", name="Re φ̂(t)"))
fig.add_trace(go.Scatter(x=t_grid, y=np.imag(phi_hat), mode="lines", name="Im φ̂(t)"))
fig.update_layout(
title="Empirical characteristic function of gengamma (Monte Carlo)",
xaxis_title="t",
yaxis_title="value",
width=900,
height=420,
)
fig
5) Parameter Interpretation#
SciPy’s gengamma uses two shape parameters a and c, plus the standard loc and scale.
The most useful mental model#
Let
Then \(Y \sim \mathrm{Gamma}(a,1)\). So you can interpret parameters through the Gamma distribution plus a power transform.
What the parameters do#
scale(\(\beta\)): multiplies \(X\) and sets the overall units / typical magnitude.loc: shifts the distribution; the support becomes \(x>\mathrm{loc}\).a: the Gamma shape of \(Y\); largeratypically moves mass away from 0 and reduces relative variability.c: controls the power transform.for \(c>1\) the tail is thinner (decays faster than exponential in \(x\))
for \(0<c<1\) the tail is heavier (stretched-exponential)
for \(c<0\) the tail becomes polynomial-heavy, and high-order moments may not exist
Below we visualize how changing a, c, and scale changes the PDF.
x = np.linspace(1e-4, 8, 800)
# 1) Vary a with c fixed (Gamma special case: c=1)
c_fixed = 1.0
fig1 = go.Figure()
for a in [0.7, 1.0, 2.0, 5.0]:
fig1.add_trace(go.Scatter(x=x, y=gengamma_pdf(x, a, c_fixed), mode="lines", name=f"a={a}"))
fig1.update_layout(
title="Effect of a (fix c=1 → Gamma family)",
xaxis_title="x",
yaxis_title="density",
width=900,
height=420,
)
fig1
# 2) Vary c with a fixed
a_fixed = 2.0
fig2 = go.Figure()
for c in [0.5, 1.0, 2.0]:
fig2.add_trace(go.Scatter(x=x, y=gengamma_pdf(x, a_fixed, c), mode="lines", name=f"c={c}"))
fig2.update_layout(
title="Effect of c (fix a=2)",
xaxis_title="x",
yaxis_title="density",
width=900,
height=420,
)
fig2
# 3) Scale stretches the x-axis
a, c = 2.0, 1.5
fig3 = go.Figure()
for s in [0.5, 1.0, 2.0]:
fig3.add_trace(go.Scatter(x=x, y=gengamma_pdf(x, a, c, scale=s), mode="lines", name=f"scale={s}"))
fig3.update_layout(
title="Effect of scale (fix a=2, c=1.5)",
xaxis_title="x",
yaxis_title="density",
width=900,
height=420,
)
fig3
6) Derivations#
We focus on loc=0 for clarity. Adding loc is a simple shift.
6.1 PDF via a Gamma change of variables#
Let \(Y \sim \mathrm{Gamma}(a,1)\) with PDF
Define the transformation
The Jacobian is
Therefore
Setting \(\beta=1\) recovers the standardized PDF from SciPy’s docstring.
6.2 Expectation and variance#
Using the same transform,
as long as \(a+k/c>0\). Mean and variance are obtained by plugging in \(k=1\) and \(k=2\).
6.3 Likelihood (i.i.d. sample)#
For observations \(x_1,\dots,x_n>0\) (with loc=0), the likelihood is
The log-likelihood is
Closed-form MLEs are not available in general; numerical optimization is typically used (as in scipy.stats.gengamma.fit).
7) Sampling & Simulation#
The Gamma transform gives an immediate sampling algorithm.
Algorithm (NumPy-only) for loc=0:
Sample \(Y \sim \mathrm{Gamma}(a,1)\) (NumPy can do this directly).
Return \(X = \mathrm{scale}\,Y^{1/c}\).
For general loc, shift: \(X = \mathrm{loc} + \mathrm{scale}\,Y^{1/c}\).
Below we validate sampling by checking:
sample mean/variance vs the closed forms
that \((X/\mathrm{scale})^c\) looks Gamma-distributed
a0, c0, scale0 = 3.0, 1.2, 2.0
samples = gengamma_rvs_numpy(a0, c0, scale=scale0, size=200_000, rng=rng)
mean_true, var_true, *_ = gengamma_mvsk(a0, c0, scale=scale0)
mean_hat = float(np.mean(samples))
var_hat = float(np.var(samples))
print("mean (MC) :", mean_hat)
print("mean (true):", mean_true)
print("var (MC) :", var_hat)
print("var (true):", var_true)
mean (MC) : 4.887246702960949
mean (true): 4.886184597056009
var (MC) : 5.527400457004914
var (true): 5.548009631523051
# Validate the transform: Y = (X/scale)^c should be Gamma(a,1)
y = np.power(samples / scale0, c0)
grid = np.linspace(0, np.quantile(y, 0.995), 700)
fig = go.Figure()
fig.add_trace(
go.Histogram(
x=y,
nbinsx=70,
histnorm="probability density",
name="Y=(X/scale)^c (Monte Carlo)",
opacity=0.55,
)
)
fig.add_trace(
go.Scatter(
x=grid,
y=stats.gamma(a0, scale=1.0).pdf(grid),
mode="lines",
name=f"Gamma(a={a0}, scale=1)",
line=dict(width=3),
)
)
fig.update_layout(
title="Power transform check: (X/scale)^c ~ Gamma(a,1)",
xaxis_title="y",
yaxis_title="density",
width=900,
height=420,
)
fig
8) Visualization#
We’ll compare:
the theoretical PDF and CDF
Monte Carlo samples (NumPy-only sampler)
SciPy’s implementation
# PDF + histogram (Monte Carlo)
x_grid = np.linspace(1e-4, np.quantile(samples, 0.995), 900)
fig = go.Figure()
fig.add_trace(
go.Histogram(
x=samples,
nbinsx=70,
histnorm="probability density",
name="Monte Carlo (NumPy-only)",
opacity=0.55,
)
)
fig.add_trace(
go.Scatter(
x=x_grid,
y=gengamma_pdf(x_grid, a0, c0, scale=scale0),
mode="lines",
name="PDF (closed form)",
line=dict(width=3),
)
)
fig.add_trace(
go.Scatter(
x=x_grid,
y=gengamma_dist.pdf(x_grid, a0, c0, scale=scale0),
mode="lines",
name="PDF (SciPy)",
line=dict(width=2, dash="dot"),
)
)
fig.update_layout(
title=f"gengamma(a={a0}, c={c0}, scale={scale0}): histogram vs PDF",
xaxis_title="x",
yaxis_title="density",
width=900,
height=420,
)
fig
# CDF: theoretical vs empirical
emp_x = np.sort(samples)
emp_cdf = np.arange(1, emp_x.size + 1) / emp_x.size
x_grid = np.linspace(0, np.quantile(samples, 0.995), 700)
fig = go.Figure()
fig.add_trace(go.Scatter(x=x_grid, y=gengamma_cdf(x_grid, a0, c0, scale=scale0), mode="lines", name="CDF (closed form)"))
fig.add_trace(
go.Scatter(
x=emp_x[::400],
y=emp_cdf[::400],
mode="markers",
name="Empirical CDF (subsampled)",
marker=dict(size=4, opacity=0.6),
)
)
fig.update_layout(
title=f"gengamma(a={a0}, c={c0}, scale={scale0}): theoretical vs empirical CDF",
xaxis_title="x",
yaxis_title="CDF",
width=900,
height=420,
)
fig
9) SciPy Integration (scipy.stats.gengamma)#
scipy.stats.gengamma provides the usual distribution API:
gengamma.pdf(x, a, c, loc=0, scale=1)gengamma.cdf(x, a, c, loc=0, scale=1)gengamma.rvs(a, c, loc=0, scale=1, size=..., random_state=...)gengamma.fit(data, ...)(MLE)
A common pattern is to freeze the distribution: rv = gengamma(a, c, loc=..., scale=...), then call rv.pdf, rv.cdf, rv.rvs, etc.
# Basic SciPy usage + fitting
a_true, c_true, loc_true, scale_true = 3.0, 1.2, 0.0, 2.0
rv_true = gengamma_dist(a_true, c_true, loc=loc_true, scale=scale_true)
data = rv_true.rvs(size=3000, random_state=rng)
# Fit with loc fixed to 0 for stability/interpretability
# (If loc is free, optimizers can sometimes trade off loc/scale.)
a_hat, c_hat, loc_hat, scale_hat = gengamma_dist.fit(data, floc=0.0)
print("true params: ", (a_true, c_true, loc_true, scale_true))
print("fitted params:", (float(a_hat), float(c_hat), float(loc_hat), float(scale_hat)))
rv_fit = gengamma_dist(a_hat, c_hat, loc=loc_hat, scale=scale_hat)
x_grid = np.linspace(1e-4, np.quantile(data, 0.995), 800)
fig = go.Figure()
fig.add_trace(go.Histogram(x=data, nbinsx=70, histnorm="probability density", name="data", opacity=0.55))
fig.add_trace(go.Scatter(x=x_grid, y=rv_true.pdf(x_grid), mode="lines", name="true pdf", line=dict(width=3)))
fig.add_trace(go.Scatter(x=x_grid, y=rv_fit.pdf(x_grid), mode="lines", name="fitted pdf", line=dict(width=2, dash="dot")))
fig.update_layout(
title="SciPy fit: fitted PDF vs true PDF",
xaxis_title="x",
yaxis_title="density",
width=900,
height=420,
)
fig
true params: (3.0, 1.2, 0.0, 2.0)
fitted params: (2.8464391259204764, 1.2301080320124074, 0.0, 2.133363346645424)
10) Statistical Use Cases#
Hypothesis testing (goodness-of-fit)#
If parameters are specified (not estimated from the same data), you can test whether data plausibly comes from a generalized gamma using goodness-of-fit tests such as Kolmogorov–Smirnov (KS).
Caveat: if you estimate parameters from the data and then run KS on the same data, the usual p-values are no longer exact (you need a corrected procedure or bootstrap).
Bayesian modeling#
The likelihood is available in closed form, so you can do Bayesian inference with generic methods. There is no simple conjugate prior for \((a,c,\mathrm{scale})\) in general, but you can still:
infer a subset of parameters (e.g.,
scale) with grid methodsuse MCMC / variational inference frameworks for full Bayesian inference
Generative modeling#
gengamma is useful as a flexible positive noise model, or as a component in a mixture model when you want strictly-positive data with tunable skew/tails.
Because high-order moments may not exist for some parameter settings (especially \(c<0\)), prefer robust summaries (medians/quantiles) when exploring or evaluating fits.
# Hypothesis testing example: KS test when parameters are known.
from scipy.stats import kstest
# Generate data from the known model
rv = gengamma_dist(2.5, 1.3, scale=1.7)
x = rv.rvs(size=800, random_state=rng)
# One-sample KS test against the known CDF
stat, p_value = kstest(x, rv.cdf)
print("KS statistic:", float(stat))
print("p-value:", float(p_value))
KS statistic: 0.06359559061112985
p-value: 0.0029546837073747357
# Bayesian modeling example: grid posterior for scale with (a,c) assumed known.
# Data generated from a known model
true_a, true_c, true_scale = 3.0, 1.2, 2.0
x = gengamma_dist(true_a, true_c, scale=true_scale).rvs(size=500, random_state=rng)
# Grid over scale (log-spaced)
scale_grid = np.exp(np.linspace(np.log(0.3), np.log(5.0), 400))
loglik = np.array([np.sum(gengamma_logpdf(x, true_a, true_c, scale=s)) for s in scale_grid])
# Log-uniform prior p(scale) ∝ 1/scale -> log prior = -log(scale)
logpost_unnorm = loglik - np.log(scale_grid)
# Normalize on the grid
logpost = logpost_unnorm - np.max(logpost_unnorm)
post = np.exp(logpost)
post /= np.trapz(post, scale_grid)
scale_map = float(scale_grid[np.argmax(post)])
# 90% credible interval from the grid
cdf_post = np.cumsum(post) * np.diff(scale_grid, prepend=scale_grid[0])
lo = float(np.interp(0.05, cdf_post, scale_grid))
hi = float(np.interp(0.95, cdf_post, scale_grid))
print("true scale:", true_scale)
print("MAP scale :", scale_map)
print("90% CI :", (lo, hi))
fig = go.Figure()
fig.add_trace(go.Scatter(x=scale_grid, y=post, mode="lines", name="posterior"))
fig.add_vline(x=true_scale, line_dash="dash", line_width=2, annotation_text="true")
fig.add_vline(x=scale_map, line_dash="dot", line_width=2, annotation_text="MAP")
fig.update_layout(
title="Posterior over scale (a,c known)",
xaxis_title="scale",
yaxis_title="density",
width=900,
height=420,
)
fig
true scale: 2.0
MAP scale : 1.8631642601470884
90% CI : (1.790712999885101, 1.9140843629849709)
# Generative modeling example: fit gengamma and generate synthetic samples.
# Simulate a positive dataset
rv_true = gengamma_dist(3.2, 0.9, scale=1.5)
data = rv_true.rvs(size=2000, random_state=rng)
# Fit (fix loc to 0)
a_hat, c_hat, loc_hat, scale_hat = gengamma_dist.fit(data, floc=0.0)
rv_fit = gengamma_dist(a_hat, c_hat, loc=loc_hat, scale=scale_hat)
sim = rv_fit.rvs(size=data.size, random_state=rng)
qs = np.array([0.1, 0.5, 0.9])
print("quantiles:")
print(" data:", np.quantile(data, qs))
print(" sim :", np.quantile(sim, qs))
x_grid = np.linspace(1e-4, np.quantile(data, 0.995), 800)
fig = go.Figure()
fig.add_trace(go.Histogram(x=data, nbinsx=70, histnorm="probability density", name="data", opacity=0.5))
fig.add_trace(go.Histogram(x=sim, nbinsx=70, histnorm="probability density", name="sim (fitted)", opacity=0.35))
fig.add_trace(go.Scatter(x=x_grid, y=rv_fit.pdf(x_grid), mode="lines", name="fitted pdf", line=dict(width=3)))
fig.update_layout(
title="Generative check: fitted gengamma samples vs data",
xaxis_title="x",
yaxis_title="density",
width=900,
height=420,
)
fig
quantiles:
data: [ 1.82517 4.77339 10.05547]
sim : [1.93852 4.65291 9.56727]
11) Pitfalls#
Invalid parameters: \(a\le 0\),
scale \le 0, or \(c=0\) are not allowed; the support is \(x>\mathrm{loc}\).Moment existence (especially for \(c<0\)): raw moment \(\mathbb{E}[X^k]\) exists only when \(a + k/c > 0\).
Numerical issues for extreme values:
prefer
logpdf/logcdfwhen multiplying many densities or working in the far tailfor large \(|c|\) or large \(x\), computing \(x^c\) can overflow/underflow; use log-space when possible
Fitting sensitivity / identifiability: two shape parameters + scale can lead to flat likelihood regions; consider fixing
loc, using good initial values, and validating with QQ/CDF plots.Interpretation:
locshifts the support; if your quantity is inherently nonnegative, fixingloc=0often yields a more interpretable fit.
12) Summary#
gengammais a continuous distribution on \((0,\infty)\) (or \(x>\mathrm{loc}\) with location/scale).Its PDF is \(f(x;a,c)=|c|\,x^{ca-1}e^{-x^c}/\Gamma(a)\) (standard form).
The power transform \((X/\mathrm{scale})^c\) is Gamma-distributed, which yields closed-form raw moments and a simple sampling algorithm.
Special cases include Gamma (\(c=1\)), Weibull (\(a=1, c>0\)), Exponential (\(a=1,c=1\)), and Chi (with appropriate parameters).
Some parameter settings (notably \(c<0\)) can make higher moments diverge; use robust summaries and numerical care.
References#
E. W. Stacy (1962). “A Generalization of the Gamma Distribution.” Annals of Mathematical Statistics.
Johnson, Kotz, Balakrishnan. Continuous Univariate Distributions, Wiley.
SciPy documentation:
scipy.stats.gengamma.